Baseball Card Valuation Trends

Jack Davis

2025-09-10

Why care about cards

Why care about cards

Why care about cards

Why care about cards

Card collecting has gotten more interesting in recent years with the introduction of metal plate cards, and embedded pieces of memorabilia like jerseys and bats in the cards.

Source: https://www.spudart.org/blog/relic-cards-photographed-on-jerseys/

The card market

Data Science

This is the first talk of the term, so let’s focus on the ‘how’.

First, I got a PDF of a 2024 copy of the Beckett price guide for cards. It contains tens of thousands of card prices.

Data Science

Each listing has the printed set that it came from in bold at the top. (e.g., 2017 Panini Gold Standard), and the values of the individual cards (in mint condition).

Here we see that any common card from this set is worth between 60 cents and $1.50 USD, depending on retail location.

Any common autographed jersey piece cards not otherwise listed in the names below goes between 3 and 8 dollars. There were 199 pieces of each jersey embedded into the card (COMMON JSY AU p/r 199).

Any of the 269 Ichiro cards are worth between 1.50 USD and 4 USD (Ichiro/269)

Data Science

This PDF wasn’t searchable, it was embedded as an image, so I used optical character recognition (OCR) with the tesseract package in R to get the data into raw text.

Data Science

However there was an issue: While the original Tesseract software can automatically detect when text is in columns, the R package that wraps Tesseract doesn’t seem to have that option. That could be workable in text processing, except the titles of the card sets aren’t the same font size, so the rows don’t line up and reading row-by-row produces nonsense.

How can we split this into columns? Take a look at the average brightness of the pixels of a page, arranged by column. The downward spikes correspond to the vertical column bars printed on the page.

Data Science

By splitting the image at the downward spikes, and applying OCR and some light text processing, we get text like this:

Data Science

Applying data wranging and some subject knowledge, we get a data frame with the card prices and player and card set names.

Data Science

…and by merging in the WAR (Wins above replacement) data, as well as career data like year and lifetime earnings from an additional pre-scraped source, and we have all the information we need to do a simple analysis.

https://github.com/Neil-Paine-1/MLB-WAR-data-historical

The Trend

First, let’s do the title trend and compare the price of the cards to the career WAR. Maybe a quantile regression would work better for this. Notice that I used to the Q3 of the higher retail price - that’s because I wanted to reduce the effect of the extremely high priced cards, as well as weed out a lot of the junk that gets printed, especially of popular players.

Also, WAR is on a log scale.

The Trend

Speaking of printing a lot of popular players, the trend for the number of mentioned cards printed (i.e., ones that aren’t bundled into the ‘common’ category) over WAR is a lot clearer.

The Trend

We also have the year that the player last played as a predictor of price. I was actually looking for a declining line, which might suggest that older cards about older players are worth more. There didn’t seem to be a trend here.

The Trend

Look farther out, there’s only a very slight slope downwards that gets reversed when we encounter active players that have played in the last year.

Still to do

See Also

See also:

https://www.cardladder.com/

https://www.pricecharting.com/trading-cards

https://www.psacard.com/priceguide